Purrr and functions

Part 2: tidy data for functional programming and iteration

John Little

Duke University Libraries

Center for Data & Visualization Sciences

2023-09-28

Review

Last week

  • Tall versus wide data frames (tibbles)
    • Why? iterate row-by-row without FOR Loops
      • Example: ggplot2::facet_wrap()
    • How? tidyr::pivot_longer() | tidyr::pivot_wider()

Summary Review

R can feel like a different galaxy

  • Day 1. Importing data and orientation to R / RStudio / Quarto

  • Day 2. Using verbs, i.e. {dplyr} functions, to subset and wrangle data

    • grammar of data
  • Day 3. Using {ggplot2}

    • grammar of graphics
  • Day 4. gai-assisted coding

  • Day 5. pivot and join

    • Tall data so I don’t have to write a FOR Loop
  • Day 6. iteration and functions

Functions

What is R

  • A data-first, functional, programming language

  • Functional programming languages mean you don’t have to write a for loop

  • Rule of Thumb: If you compose an expression three or more times, write a function

Everything in R


Every object is either a vector or a function


my_vector <- 7:14   # a numeric vector
dplyr::select()     # A function from the dplyr package

Vectors are data

All data are vectors

Data types

  • Vectors

  • Data Frames (i.e. Tibbles) are 2 dimensional vectors

  • Lists

  • Matrices

Many Tidyverse functions are iterable



You can imply a for loops


library(tidyverse)

starwars |> 
  mutate(name_lc = str_to_lower(name), .after = name)

Custom Functions


add_numbers <- function(x, y) {
  x + y
}

starwars |> 
  select(where(is.numeric)) |> 
  mutate(my_sum = add_numbers(height, mass))

What is a FOR Loop


A flow control-flow-statement


for (variable in sequence) {
    expression
}

Environment variables and data variabls

make_scatterplot_with_vars <- function(my_df, my_x, my_y) {
  my_df |> 
    ggplot(aes({{my_x}}, {{my_y}})) + 
    geom_point()
}

starwars |> 
  filter(mass < 500) |> 
  make_scatterplot_with_vars(my_x = height, my_y = mass)

starwars |> 
  filter(mass < 500) |> 
  make_scatterplot_with_vars(height, birth_year)

cars |> 
  make_scatterplot_with_vars(speed, dist)

{Purrr}

map()


Apply a function to each element of a vector or list


Can also do this in Base-R with apply(), sapply(), mapply(), lapply()

nest()

name gender height mass
1 Luke Skywalker masculine 172 77
2 C-3PO masculine 167 75
3 R2-D2 masculine 96 32
4 Darth Vader masculine 202 136
5 Leia Organa feminine 150 49
6 Owen Lars masculine 178 120
7 Beru Whitesun lars feminine 165 75
8 R5-D4 masculine 97 32
9..86
87 Padmé Amidala feminine 165 45
starwars |> 
  select(name, gender, height, mass) |> 
  nest(my_data_by_gender = -gender) 
# A tibble: 3 × 2
  gender    my_data_by_gender
  <chr>     <list>           
1 masculine <tibble [66 × 3]>
2 feminine  <tibble [17 × 3]>
3 <NA>      <tibble [4 × 3]> 


gender name height mass
1 feminine Leia Organa 150 49
2 feminine Beru Whitesun lars 165 75
3 feminine Mon Mothma 150 NA
4..16
17 feminine Padmé Amidala 165 45